Image Classification Based on Light Convolutional Neural Network Using Pulse Couple Neural Network

Recently, most image classification studies solicit the intervention of convolutional neural networks because these DL-based classification methods generally outperform other methodologies with higher accuracy. However, this type of deep learning networks require many parameters and have a complex structure with multiple convolutional and pooling layers depending on the objective. These layers compute a large volume of data and it may impact the processing time and the performance. Therefore, this paper proposes a new method of image classification based on the light convolutional neural network. It consists of replacing the feature extraction layers of standard convolutional neural network with a single pulse coupled neural network by introducing the notion of foveation. This module provides the feature map of input image and the data compression using Discrete Wavelet Transform which is an optional step depending on the information quantity of this signature. The fully connected neural network, which has six hidden layers, classifies the image. With this technique, the computation time is reduced, and the network architecture is identical and simple independent of the type of dataset. The number of parameter is less than that in current research. The proposed method was validated with different dataset such as Caltech-101, Caltech-256, CIFAR-10, CIFAR-100, and ImageNet, and the accuracy reaches 92%, 90%, 99%, 94%, and 91%, respectively, which are better than the previous related works.


Introduction
For a software developer, it is a big challenge to search an image in database based on keyword, and the appropriate solution is to associate a label to all existing image. Finding a labelled image in database with table indexed facilitates the task. Tis operation of labeling is mainly called image classifcation which refers to a process in computer vision that can classify an image according To its visual content. Human visual is a perfect solution of image recognition; however, we cannot allocate a human resource to accomplish this task, and then automation is required.
Te CNN or convolution neural network is categorized as a deep learning model, which is inspired by the organization of animal visual cortex used for processing data that has a grid pattern, such as images [1][2][3], and designed to automatically and adaptively learn spatial hierarchies of features from low-to high-level patterns. Convolution, pooling, and fully connected layers are the three types of layersthat constitute the CNN neural network. Te feature extraction is ensured by convolution and pooling layers (frst two layers), whereas the third, a fully connected layer, maps the extracted features into the fnal output for classifcation. Te major recent works related to image classifcation use CNN to have a good result.
In 2014, GoogLeNet-19 developed by Google [4] was placed in frst rank using 4 million parameters with a 6.67% of top-5 error rate, and in the second place, VGGNet-16, created by Simonyan, Zisserman [4] with 138 million parameters, and the top-5 error rate is 7.3%. It is evident that managing these parameters is difcult with a high number of layers. So, in this paper, we will propose an efcient approach with minimum computation time, minimum parameters, and minimum number of layers to classify images based on the light convolutional neural network (LCNN). To accomplish this, we suggest swapping the convolution and pooling layers of CNN with a single layer of pulse coupled neural network (PCNN) plus foveation contribution (when we visualize an image, we do not stare for longtime but we focus only on the pertinent information. It is a human cortex visual behavior called "foveation") and an optional feature representation by the discrete wavelet transform (DWT). Te fully connected layer remains the same but with minimum of neurons and hidden layers. To validate our method, we applied it to three databases with diferent classes and compare the result with several recent state-of-the-art methods. Te main contributions of this work are cited as follows: (i) Te proposed image classifcation system has a simple architecture, and the topology remains unchanged, which is independent of image input, and due to this simplicity, the quantity of data to process is reduced compared with CNN, and it allows us to have an optimal computation time.
Such kind of solution may be supported by embedded systems.
(ii) Related to the frst contribution, the approach works with minimum number of parameters, that is, less than 20.
(iii) Foveation intervenes to collect the pertinent information to facilitate the construction of the image signature. It is a simple process compared with the succession of convolution and spooling operations used by CNN. (iv) DWT reduces the size (row × column) of image map in the aim to have a minimum number of neurons for the deep learning network. (v) Te approach provides high accuracy greater than or equal to the technique based on CNN, and even the proposed architecture is very simple.
Te rest of the paper is organized as follows: Section 2 summarizes the recent works related to our proposed approach. Te Section 3 describes the mathematic model of PCNN. Te proposed method is the purpose of Section 4 followed by experimental results in Section 5 and discussion in Section 6. Finally, Section 7 concludes the paper with motivation. To ensure a good understanding of this paper, Table 1 presents the list of abbreviations and defnitions.

Literature Review
Ferraz and Gonzaga [5] introduced a study focused on object classifcation based on local texture descriptor and a support vector machine. Recently, two new texture descriptors are proposed for object detection based on the Local Mapped Pattern (LPM) approach. Te Center-Symmetric Local Mapped Pattern (CS-LMP) and Mean-Local Mapped Pattern (MLMP) exhibit better performance than SIFT and CS-LBP, but prior results have proven that the size of descriptors could be decreased without loss of sensitivity. In their research, they investigated the decreasing size of the M-LMP descriptor, and the performance measurement was done by using the support vector machine (SVM) classifer for object classifcation. In those experiments, they applied an object recognition system based on the M-LMP reduced descriptor and compared those efects with the CS-LMP, Local Intensity Order Pattern (LIOP), and SIFT descriptors. Te object classifcation outcomes analyzed the use of a Bag of Features (BoF) model and an SVM classifer, with the end result that overall performance using the reduced descriptor is higher than the other three well-known techniques tested and additionally requires less processing time. Te experience was done with Caltech-101 and ImageNet dataset and the performance was good except with background Google class because the extraction feature drops some sensitive information and leads to the wrong deduction. Tis research can be compared with study done by Srivastava et al. [6] because both have the same objective and use a common Caltech-101 dataset to validate their experience. Te last is a new concept of image classifcation using bag of LBP features constructed by clustering with fxed centers and  [7] proposed a new CNN technique which could classify the images without difculty compared to the other traditional models and gain better overall performance. With this method, the useful characteristic presentation of pretrained network can be efcaciously transferred to target task, and the original dataset can be augmented with the most treasured Internet images for classifcation. Te method not only greatly reduces the requirement of a large training data but additionally efectively increases the training dataset. Both methods' capabilities make contributions to the considerable over-ftting reduction of deep CNNs on a small dataset. In addition, they successfully apply Bayesian optimization to remedy the tuf problem, hyper-parameter tuning, in network fne-tuning. Te approach is applied to six public small datasets. Extensive experiments show that compared to conventional methods, the solution can help the famous deep-learning CNNs to achieve better performance. Specially, ResNet can outperform all the state-of-the-art models on six small datasets. Te experiment results prove that the proposed solution can be a remarkable tool for dealing with practice problems that might be related to using deep CNNs on a small dataset; however, the accuracy decreases once the approach is applied to the large dataset or the dataset has many classes.
Çalik and Demirci [8] presented an image classifcation approach on embedded systems. Te challenge was to apply CNN with device having a limited memory, and the result gives 85.9% accuracy using CIFAR-10 dataset with memory allocation of 2 GB. Te limitation of this method is same as Srivastava et al. [6] research which has a difculty to train through a big dataset. Dhouibi [9] published a paper-entitled optimization of the CNN model for image classifcation. It is talking about topology optimization of CNN in terms of number of layers and the number of neurons per layer. Tis optimal solution allows to reduce the model and enable to deploy it in embedded platforms. Tis research was experimented with the same previous dataset, and they obtained 82.43% accuracy. A third experience with the CIFAR-10 dataset is presented by Sharma and Phonsa [10]. Tey used the sequential method for the CNN and implemented the program in Jupiter notebook. Tey took 3 classes and classify them using CNN. Te classes were airplane, bird, and car. Tey present the classifcation by using CNN, and they took batch size as 64. Tey got 94% accuracy for the 3 classes.
Wang and Sun [11] present a new method of image classifcation using CNN with wavelet domain inputs. Te idea is to replace the frst several convolutional layers part of feature extraction of standard CNN with wavelet packet transform or dual-tree complex wavelet transform. Tese wavelets transform allows to have a higher resolution of the image in preprocessing step. Te advantage is to keep the essential information present in image to ensure a correct classifcation because with CNN, some important information may loss during convolution calculation. During the experience, Caltech-256 dataset and DTD dataset with ResNet-50 are used, and there is a maximum improvement of 2.15% and 10.26%, respectively, as accuracy. Now, we are interested on the methods using ImageNet dataset qualifed as largest image database on this area.
(i) Xception [12] or Extreme Inception is an improved version of the CNN inception model. Two levels are present on this conception as follows: the frst level is composed by a single layer which slices the output into 3 segments and sent it to next flters. 1 * 1, 3 * 3 are, respectively, the convolution level of each flter. Te depth-wise separable convolution [13][14][15] is the component which defnes the Xception model. Tis technique intervenes in image classifcation with wide range of image having hundreds of classes (79% of accuracy for ImageNet dataset).
(ii) VGG16 [12], which is inspired from AlexNet, has 16 layers and 3 fully connected layers. In the middle, there is 5 max pooling, and the Softmax is the output activation function [16][17][18] and ReLU for hidden layers. VGG19 [19] has a same concept as VGG16; however, this CNN contains 19 layers with 3 fully connected layers for classifcation and 16 convolution layers for feature extraction. Te accuracy top-1 score for both is 71.3%.
(iii) ResNet152V2 and MobileNetV2 [20] are wellknown as CNNs for pretrained deep learning. Tey are specialized on feature extraction, prediction, and classifcation. A fully convolution layer through 32 flters and 19 residual bottleneck layers forms the architecture model of MobileNetV2. Concerning the ResNet152V2, it has thousands or hundreds of convolution layers, and the particularity compared with the previous version is that it employs a normalization batch before each weight layer. 78.0% and 71.3% are the recognition rate got with ImageNet dataset.
Computational Intelligence and Neuroscience (iv) NASNetLarge is a generation of CNN having a capacity to train more than a million pictures from ImageNet dataset and classify more than thousand objects. An input image of this network has 331 × 311 size and the strong point of this concept is that it has learned rich feature representations for a wide range of images. Te experience is showing that the fnal accuracy rate reaches 82.5%. On the other hand, 84.3% is the performance using Ef-cientNetB7 [21]. EfcientNetB7 is a release of EfcientNet which is a lightweight NAS-based network created by Google in 2019.
Te common point of these studies is the ambition to optimize the standard CNN. Each research has its own methodology to extract image feature to reach the goal. Concerning the classifcation layer, some stay with one or more fully connected neural networks and the other tries to intervene SVM. Tey are selected as part of state of the art in this paper because the objective is similar even the experimental dataset then we have a possibility to compare the performance.

Pulse Coupled Neural Network
According to Srinivasan et al. [22] presentation, PCNN is inspired from behaviors of cat visual cortex phenomena. Te modelling architecture is composed of three parts, namely, the dendritic tree, the linking modulation, and the pulse generator. Te frst part has two types of entries, namely, feeding and linking. Te feeding receives the local and external stimulus; however, the linking captures the local only. Te second part, which is the linking modulation, combines the outputs from two channels by adding a bias to the linking and multiplying it with feeding. Internal state of neuron U j is the result of such combination, and this internal state and the threshold help the last part pulse generator to generate the pulse.
Lo et al. [23] introduce PCNN in image processing area and the mathematics modelling is defned below. Te Table 2 explains the meaning of diferent parameters in PCNN.
(i) First part (dendritic tree): (1) (ii) Second part (linking modulation): (iii) Last part (pulse generator): Te internal state of the neuron is compared to a dynamic threshold, Θ, to produce the output, Y, by Te threshold is dynamic in that when the neuron fres (Y > Θ) the threshold then signifcantly increases its value [23]. Tis value then decays until the neuron fres again. Tis process is described by According to equation (3), the output is binary and then there is a lot of candidates for the foveation points because with standard PCNN, a threshold function having output 0 or 1 is used by the pulse generator module. Tis issue can be solved by adapting the sigmoid pulse generator as defned in equation (5) [24,25] as given as follows:

Proposed Method
Now, we have more visibility about PCNN which is an element involved in the image classifcation method. Te wavelet transforms and fully connected neural network (FCNN) will be explained briefy during these interventions in the approach. Te proposed system has two modules, namely, feature extraction and deep learning module, and a clear presentation of the approach is shown in Figure 2.

Feature Extraction.
First step is to choose the image dataset and split it in two parts, namely, training and validation. All existing image in database must be converted to grayscale and resized (optional) because PCNN can process only a matrix with one dimension instead of three like an RGB image. Image resizing is applicable only when the image has a large dimension. A part of color conversion, preprocessing module, has two flters, namely, Canny and blurring flter. Te reason of this choice is to reduce the quantity of information to be processed. Canny flter is an edge detection operator that uses a multistage algorithm to detect a wide range of edges in images. It was developed by John F. Canny [27] in 1986. Blurring flter [27,28] is a low pass flter, because it allows low frequency to enter and stop high frequency. Here, frequency means the change of pixel value. Around edge pixel, value changes rapidly as blur image is smooth; so high frequency should be fltered out. Te Figure 3 represents such details.
PCNN extracts the essential part from blurring image and eliminates the noise background. High number of iterations is required to ensure that PCNN accomplishes his task. Before starting the iteration, we should initiate the neural network parameters as follows: (ii) Initial values of matrix Te preliminary values of linking L, feeding F matrix, and stimulus S are similar to the enter image. Te convolution among null matrix which has the same length as the enter image R × C and weights matrix initiates the output value Y of PCNN. Te initial value of dynamic threshold Θ is an R-by-C matrix of two. (iii) Constants delay (iv) Constants normalization Te maximum number of iterations is fxed to 40 and the calculation of the percentage of misclassifed pixel [29] indicates the image to be selected. Te frst minimum rate corresponds to excellent image segmentation and the second to edge detection, so we are interested in the second result shown in Figure 4. Its gray level varies between 0 and 1 due to the sigmoidal pulse generator used by the PCNN neural network.
PCNN task is completed by extracting the relevant information. Currently, we solicit the foveation method to collect the data sensitive to human eyes. For this, we apply an image threshold and we have the result shown in Figure 5(a). Now, we should reduce the dimension of the image (this step is optional if the image has a small size like 32 × 32), and it can be done by Haar Wavelet Transform (HWT). HWT operates simultaneously in spatial and frequency domain information in image processing. It is a transform for which the wavelets are sampled at discrete intervals [30,31]. Haar wavelet operates on data by calculating the sums and  Computational Intelligence and Neuroscience diferences of adjacent elements. To apply HWT on images, a simple explanation is shown in Figure 6. Four subbands, namely, LL, HL, LH, and HH subbands (L � Low, H � High) compose the resulting image where LL-subband contains an approximation of the authentic image while the other subbands comprise the missing details. Te LL-subband output from any stage can be decomposed similarly [32].
We apply HWT transform three times to the foveation image, and we are interested on the second LL-subband (in Figures 5(b)-5(d)). Te resulting image will be reshaped to vector to constitute the value of input layer of FCNN.

4.2.
Classifcation. FCNN has three parts, namely, input, hidden, and output layers. As the name is called fully connected, it means that each neuron connects to all neurons existing in the next layer. Before going to the activation function, the computation of input, weight, and bias must be done beforehand. We focus only on two activation functions, namely, the nonlinear ReLU function and softmax function. Tey are defned in equations (9) and (10).
where x i is the sum inputs improved by means of weights plus bias and N the number of neurons in the output   Computational Intelligence and Neuroscience layer. Te value of the ReLU function is 0 or x i , and for softmax function, it is between 0 and 1 because it is indicating the probability that in which class the image belongs. Te feature map of the input image constitutes the input layer (size of image signature × 1), and the image class membership forms the output layer [33]. Six hidden layers are required at least and, in this paper, we fx it to 6. Te activation characteristic for them is the ReLU function, and all weights are initialized randomly. It means that there are six weights, namely, is the size of weight w i . For experience purpose, the value of h is a square root of size of image signature and number of classes. Concerning output layer (number of classes x 1), the number of neurons is the same as the number of classes present in dataset. Te neuron which has a high probability value determinates the belonging class. Te activation function softmax ensures this probability format. Evidently, the number of neurons in input layer is equivalent to the length of image signature vector. Te percentage of image allocated for testing depends on the searcher choice but it is important to have a percentage training dataset more than testing images. During training phase, the output neuron corresponding to input image signature is 1 and 0 for leftovers.

Experiments
To evaluate the performance of the proposed method, we introduce three datasets that are used by diferent research cited in literature review Section 2 in the aim to compare these performances with ours. Tey are publicly available. Te Section 4.1 describes the content of each dataset and Section 4.2 details the performance using image classifcation measurement like accuracy [34], loss [35], precision, recall, and F1 score [36,37].

Dataset Description.
Caltech-101 (Te dataset is available at https://www.kaggle.com/datasets/862ae86edba271c39f76 d0b530edeb55076b4b82b971160637210900747c44b1) is the frst image dataset that we use to test our conception. It includes photos of gadgets belonging to 101 classes plus one background clutter class. Every photo is labelled with single item and every class carries kind of forty to 800 pics, totaling to 9146 photos. We are not able to show here all content of this dataset; however, a sample of images is presented in Figure 7 [24].
Te second dataset is Caltech-256 (Te dataset is available at https://www.kaggle.com/datasets/jessicali9530/ caltech256) dataset [38] having 30607 natural photographs, consisting of 256 object categories and 1 random background class. Te common variety of photos in every class is 119 (variety from eighty to 827) and the average Computational Intelligence and Neuroscience photo dimension is 371 × 326. A sample snapshot is presented in Figure 8.
Te fourth dataset is CIFAR-100 (Te dataset is available at https://www.cs.toronto.edu/~kriz/cifar.html) which is similar to the CIFAR-10, except it has 600 images for each class (100 classes in total). In CIFAR-100, there are 20 super classes subgrouped into 100 classes. Te dataset comes with two labels for each image such as a "fne" label (class) and a "coarse" label (superclass). A sample of images present in this dataset is shown in Figure 10.
Te last dataset for experiments is ImageNet (Te dataset is available at https://www.image-net.org/ download.php). It is a wide database having more than one million images and spans 1000 object classes. ImageNet dataset is publicly available and a snap shot is shown in Figure 11.

Performance Measurement.
We fx the number of epochs to 2500, it does not depend on dataset but it can be increased to improve the accuracy. Te frst experience was done with Caltech-101 dataset that 75% of image will be processed for training purpose and 25% (2279 images) of remaining dataset will pass through our network for validation. It means that we test 25% for each class. Te dataset split must be the same as used by previous studies; otherwise, we cannot compare the result. Te accuracy average is around 91%, and the sample of performance is the object of Table 3. Te precision is excellent when the number of images belonging to a class is not high. We remark also that the accuracy commences acceptable when reaching 1500th epoch according to the Figure 12. Concerning the loss, it converges to null once the epoch is near to 1700.
Te Caltech-256 is considered an improvement to its predecessor, the Caltech 101 dataset, with new features such as larger category sizes, new and larger clutter categories, and overall increased difculty. Te accuracy is reduced 2% compared with Caltech-101 ( Figure 12) because the number of class is increased; however, the performance is better if the number of images in one class is large. We can observe it for motorbikes experience (Table 4). Te loss value is considerable until the end of experience ( Figure 13). To fx this issue, it is possible to augment the number of epochs but it will have an impact on the other parameter. For precision, the loss function used is the cross entropy as defned in (11) where t i is the truth label, p i the softmax probability for the i th class and N, the number of image class present in dataset [40]. Te experience with CIFAR-10 is rapid because the number of classes is less which is why the accuracy rate is high from 1000 th epoch. Resizing image and HWT is not required because the image has a small dimension (32 × 32). We select 50000 images (90%) for training and 10000 images (10%) for testing. Tis partition is the common partition used by previous researchers' works. Same as proceed with Caltech-256, the full result is presented in Table 5 which provides the accuracy details for each class. Regarding the epoch, it is shown in Figure 14. Te proposed method by Sharma and Phonsa [10] was tested with 3 classes, namely, aeroplane, bird, and cat. If we limit only our test with these classes, we got an accuracy of 99%. Now, we test the technique with largest image dataset like CIFAR-100 and ImageNet. Te performance is reduced because the dataset has many classes and the number of images for testing is less too (Figures 14 and 15). It can be improved by increasing the number of epochs; however, it may have an impact in computational time. To support such suggestion with the embedded system, a device having a good confguration is necessary. As we see in Figure 13, the loss function starts with highest value and it becomes  Computational Intelligence and Neuroscience   Computational Intelligence and Neuroscience 9 negligible at the end of the epoch. Te cross-entropy trend for both datasets is diferent comparing with three previous ones. Te experience metrics are presented in Tables 6 and 7, and we notice that our accuracy is still competitive. Most of image classifcation research studies based on CNN use ImageNet as dataset, and we will compare these performances with ours using the same device confguration which is described as follows: (i) CPU: AMD EPYC Processor (with IBPB) (92 core) (ii) RAM: 1.7T (iii) GPU: Tesla A100 (iv) Batch size: 32 As a part of top-1 accuracy, we compare also the top-5 accuracy, number of parameters, and computation time per each method in Table 8. We remark that our proposed method has a good performance. With another device having a limited memory like embedded systems 2 GB, the computation time augments but is still tolerable. Te research done by Çalik and Demirci [8] is dedicated for small dataset (CIFAR-10); however, we have high rate of recognition 85.9% vs 99.11%.
Before closing this paragraph, we confront our result with some research studies using a smallest dataset such as Caltech-101, Caltech-256, CIFAR-10, and CIFAR-100 (Table 9). We see that the proposed approach leads the performance except for Caltech-256 experience in which we are on the second position. Te symbol «−» in tables means that that the authors did not provide the information in these paper publications and « * », the maximum value.

Discussion
Most of recent research in image classifcation choose CNN as a neural network to accomplish the task. It collects the relevant information in feature layer which is the estate of convolution and pooling. Both operations reduce the volume of information to be processed and the fnal important information jugged essential called image map or signature is going through fully connected neural network for image classifcation purpose. Tis technique required from thousand to million parameters, and the architecture changes according to the dataset to be treated. It means that the solution is complex and may have an impact on the performance. For this reason, we  propose this approach with 11000 parameters maximum and simple/static architecture, and the accuracy is improved. Such result was due to the foveation produced by PCNN. Te methods that are CNN-based have a facility to classify an image containing a background because they give an importance on such information; however, ours has a weakness which is why the accuracy for the background class in Caltech-101 dataset is less (85%) because the PCNN ignores this information. Here, we are talking about top-1 accuracy but the top-5 accuracy is at 90%. Regarding the test with CIFAR-10 image dataset, the approach proposed by Sharma and Phonsa [10] has an accuracy less than ours, and even the number of classes is less because the dataset has only ten classes and the image inside does not have a large dimension. Diferent type of          Before concluding this paper, we resume in Table 10 the advantage and disadvantage for each algorithm.

. Research Motivation and Conclusion
Applying CNN for image classifcation demands high number of parameters and the feature extraction layers require a big computing resource for getting an image map, and this step may cause a delay in processing. So, the frst motivation of this research is to propose a simple architecture and a simple static model independent of input image or dataset with minimum computation time. Te second motivation is to have a neural network more efcient with an accuracy more than existing image recognition algorithms. To attend on these objectives, we resize if the image has a large dimension and converts to gray level before PCNN and foveation processing. Te resulting image goes through wavelet transformation in the three level by keeping the fnal approximation matrix for the FCNN input layer. Tis transformation reduced the information with minimum loss. For validation, we choose fve datasets, namely, Caltech-101, Caltech-256, CIFAR-10, CIFAR-100, and ImageNet and comparing the existing methods with same dataset, the proposed method has a good performance especially with small dataset like CIFAR-10.
PCNN always keeps an unmissable step in image processing area, and foveation is an application of this intelligent neural network. Aside searching picture in database, we are able to apply this approach in face recognition and fnger print recognition, for example. Te axe improvement of this study may be oriented to replace the PCNN model with modifed pulse coupled neural networks (MPCNN) or intersecting cortical version (ICM). Tree works [42][43][44] are published recently, and they can be a source of inspiration to improve this research. In future work, we focus only in two class of images, namely, person with and without facemask,  Year Paper Advantage Disadvantage 2017 [5] (i) High speed of processing (i) A lot of parameters required for training (ii) High accuracy (iii) Having an ability to intervene in big dataset images (ii) Network architecture complex 2018 [8] (i) Less processing times Te algorithm is dedicated for a small dataset like CIFAR-10; otherwise, the performance is not considerable (ii) High accuracy with small image 2019 [6] Highest accuracy for face image category (i) Long chain of processing before classifcation (ii) Lowest accuracy for classifying an image with a variant content like pizza category 2021 [9] (i) Minimum number of epochs (i) A million of parameters (ii) High accuracy with image having small size (ii) Weakness with dataset having large image 2021 [10] Minimum time of training Low accuracy for a dataset with many classes 2021 [41] High accuracy (i) Maximum number of parameters and epochs (ii) High computation time 2022 [11] (i) Maximum quantity of information in image signature (i) Training time around 13 hours (ii) Medium accuracy rate (top-1) (ii) Accuracy improved observed only for top-5 accuracy measurement - [12,[19][20][21] (i) Good accuracy (i) Too much parameters (ii) Minimum computation time 2022 Proposed method (i) Minimum parameters required (i) Number of epochs maximum is required (ii) Minimum computation time (2.11 milliseconds for an image with small size (32 × 32)) (iii) Te architecture is always the same independently of image dataset (ii) A bit difculty to classify an image having important background and in case of image with facemask, we will proceed to check whether the mask is worn correctly.

Data Availability
Te data used to support the fndings of this study are available from the corresponding author upon request.

Conflicts of Interest
Te authors declare that they have no conficts of interest.